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Abstract — Network coding permits to deploy distributed packet 
delivery algorithms that locally adapt to the network availability 
in media streaming applications. However, it may also increase 
delay and computational complexity if it is not implemented 
efficiently. We address here the effective placement of nodes that 
implement randomized network coding in overlay networks, so 
that the goodput is kept high while the delay for decoding stays 
small in streaming applications. We first estimate the decoding 
delay at each client, which depends on the innovative rate in 
the network. This estimation permits to identify the nodes that 
have to perform coding for a reduced decoding delay. We then 
propose two iterative algorithms for selecting the nodes that 
should perform network coding. The first algorithm relies on the 
knowledge of the full network statistics. The second algorithm 
uses only local network statistics at each node. Simulation results 
show that large performance gains can be achieved with the 
selection of only a few network coding nodes. Moreover, the 
second algorithm performs very closely to the central estimation 
strategy, which demonstrates that the network coding nodes can 
be selected efficiently in a distributed manner. Our scheme shows 
large gains in terms of achieved throughput, delay and video 
quality in realistic overlay networks when compared to methods 
that employ traditional streaming strategies as well as random 
network nodes selection algorithms. 

Index Terms — Network coding, delay minimization, through- 
put maximization, overlay networks. 

L Introduction 

The recent development of overlay networks offers interest- 
ing perspectives for multimedia streaming apphcations, since 
network diversity can be used advantageously for improved 
quality of service. The traditional streaming systems based 
on ARQ or channel coding techniques however generally 
fail to efficiently exploit this diversity. They either suffer 
from relatively high computational cost, require coordination 
between network nodes or lead to suboptimal performance 
in large scale networks where local channel conditions are 
hard to estimate. A different paradigm has been initiated 
recently with network coding [1], [2], where some processing 
is requested from the network nodes in order to improve 
the packet delivery performance. Specifically, network coding 
nodes combine buffered packets before forwarding them to 
next hop nodes. This coding strategy is particularly appealing 
in distributed streaming systems, as it removes the need for 
reconcihation between nodes. It locally adapts to the available 
bandwidth and packet loss rate and even permits to approach 
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max-flow min-cut bound of the network graph. Overall, the 
network coding systems have shown improved resihency to 
dynamics, delays, scalability and buffer capacities in networks 
with diversity [3]. 

The application of network coding algorithms in multimedia 
streaming systems is however not straightforward. Specifically, 
multimedia streaming imposes strict timing constraints that 
impact the design of network coding algorithms. A practical 
network coding system has been presented in [4] and ad- 
dresses the specific characteristics of streaming applications. 
It implements randomized network coding (RNC) techniques 
[5] in the network nodes and devises a protocol to deal with 
buffering issues and timing constraints. Moreover, it introduces 
the concept of generations that restricts coding operations 
to packets that share similar decoding deadlines. However, 
network coding systems still face important issues in practical 
systems due to the decoding delays imposed by successive 
network coding operations. This delay as well as the com- 
putational overhead in the system grow with the number of 
network coding nodes. It becomes therefore important to select 
efficiently the subset of nodes that perform network coding 
in order to control delay and complexity and still exploit 
efficiently the diversity in the network. 

In this paper, we discuss solutions for the selective place- 
ment of a few network coding nodes in order to reduce the 
delay for video dehvery. The nodes in the network are cate- 
gorized into network coding (NC) and store and forward (SF) 
nodes. The network coding nodes use the practical network 
coding algorithm described in [4], which has been selected 
for its effectiveness and simplicity. Similarly, we adopt the 
concept of coding generation and buffer models [4] for proper 
handling of the timing constraints in the stream delivery. We 
first build on our previous work [6] and estimate the rate of 
non-redundant packets in each network node. This rate is an 
indication of the goodput of the system as it measures the 
number of useful and non-red undanl packets that are received 
at a node. It permits to estimate the decoding delay in the chent 
nodes; this corresponds to the time necessary for collecting 
enough useful packets to build a full rank decoding system. 
We use the delay estimation to select the subset of nodes 
that should implement network coding such that a maximal 
goodput or a minimal delay is attained. We propose two 
algorithms that iteratively choose the SF nodes to be turned 
into NC nodes for improving the system performance. Both 
algorithms differ in their view of the network status. The first 
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algorithm assumes that a central node has complete knowledge 
about the status of the overlay network in terms of available 
bandwidth and packet loss rate. The second algorithm only 
uses local estimations of the network status at each node and 
provides a solution for distributed systems. The simulation 
results show that the proper selection of only a few network 
coding nodes already leads to throughput gains that come close 
to max-flow min-cut bound and greatly decrease the delay 
necessary for media stream delivery. Moreover, the algorithm 
that only considers local network statistics performs very 
competitively with the algorithm that uses full knowledge of 
the network topology. Both algorithms even select the same 
nodes for network coding in most of the cases. Furthermore, 
they both outperform basic streaming algorithms built on store 
and forward approaches as well as solutions where network 
coding nodes are selected randomly. These observations are 
confirmed in realistic overlay networks where our method can 
improve users' video experience even in the case where only 
few nodes implement randomized network coding. This is due 
to a good balance between decoding delay and efficient use 
of the network diversity. Finally, minimal network knowledge 
is often sufficient for determining the efficient positioning of 
the network coding nodes. 

The paper is organized as follows. In Section II, we present 
the framework under consideration and briefly overview the 
network coding principles. In Section III, we describe the 
model of the SF node buffer that is eventually used for delay 
computation. Then we present in Section IV a methodology 
for estimating the useful flow rate in the network nodes 
as well as the decoding delay. The centralized and semi- 
distributed algorithms for selecting the network coding nodes 
are presented in Section V. Simulation results are proposed in 
Section VI where the benefits of the proposed algorithms are 
evaluated for video streaming applications in various realistic 
network cases. Finally, the related work is discussed in Section 
VII and conclusions are drawn in Section VIII. 

II. Network Coding Framework 

We consider a streaming system that consists of servers, 
clients and intermediate nodes, as illustrated in Fig. 1. The 
overlay network offers source and path diversity, which can 
be efficiently exploited with network coding techniques that 
randomly combine packets in the nodes. This increases the 
packet diversity in the network and leads to efficient exploita- 
tion of the channel resources without the need for complex 
scheduling or nodes coordination mechanisms [1]. The net- 
work is modeled by a directed acyclic graph G ~ (V, E) 
where V is the set of network nodes and E is the set of edges 
(links) in the network. Each network link between nodes u and 
V is characterized by a bandwidth 6„ „ (expressed in terms of 
packets per second) as well as a packet loss rate 7r„.^. We 
assume that all servers transmit the same multimedia content 
to clients via intermediate nodes that could either be network 
coding (NC) or "store and forward" (SF) nodes. We consider 
that the intermediate nodes are not necessarily interested in the 
transmitted content, but rather act as helper nodes and assist 
the packet delivery system. The system implements a push- 
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Fig. 1: Illustration of a system for streaming on overlay 
networks. Multiple streaming servers (SS) send information 
to clients on a lossy packet network via intermediate nodes 
that can be either network coding (NC) or 'store and forward' 
(SF) peers. 

based packet delivery strategy that involves lower communi- 
cation and coordination overhead than a pull-based solution. 
We consider that the servers can also implement randomized 
coding on the source packets for improved robustness. The 
coded packets are then pushed to the clients through the 
successive intermediate nodes. Finally, the clients perform 
network decoding after receiving enough packets to build a 
full rank decodable system of packets. 

The SF nodes simply send at each transmission opportunity 
the first packet in their buffer, which has not been sent 
previously. The buffer is managed in a first-in-first-out manner, 
where the oldest packets are replaced by new ones when the 
buffer is full. When the outgoing bandwidth is larger than 
the incoming capacity, a SF node sends random replicates of 
packets from its buffer On the other hand, the intermediate 
nodes that perform network coding combine randomly the 
buffered packets in order to generate network coded packets 
that are further transmitted to neighbor nodes. As suggested 
in [4], the NC nodes first check whether the received packets 
are innovative, where innovative packets characterize pack- 
ets carrying novel information. Non-innovative packets are 
discarded immediately as they do not increase the symbol 
diversity into the network. Then the NC nodes randomly 
combine the remaining packets with coding operations based 
on randomized network coding (RNC) [7]. It is a simple and 
efficient network coding solution in distributed systems. RNC 
codes work similarly to rateless codes [8], [9] and can generate 
an arbitrary number of coded packets from a given set of 
source packets. It provides a means for simple bandwidth 
adaptation. 

Formally, the network coding operations performed in a peer 
node can be written as follows. A NC node u generates M 
packets by RNC. The m* network coded packet Cm is of the 
form 

where Af{u) corresponds to the set of packets of the same 
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Fig. 2: A NC node combines incoming packets pi and gener- 
ates network coding packets c„j. A header /^^ is appended to 
each coded packet and carries the coding coefficients. 



generation that are available at node u, Pi{u) denotes either 
a network coded packet or a native (uncoded) packet, and 
fm,i is a random coefficient over the Galois field of size q, 
GF(q). The basis of the Galois field is typically set to g = 256, 
as it has been shown in [4] that this guarantees high symbol 
diversity and low probability of building duplicate packets. As 
the packets combined in a node are actually combinations of 
the original data packets, the encoded packets can be expressed 
as a function of the native packets 
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where N is the total number of native packets, e.g., for video 
transmission N can be the number of video packets. The 
parameters n,; and ^ represent respectively the native pack- 
ets and their corresponding coding coefficients after random 
network coding operations. It is worth noting that some of the 
coefficients /„j , can be zero, which means that Cm does not 
contain information about the native packet n;. As the coding 
coefficients are chosen randomly, a header of constant length is 
appended to each packet with coefficient information, so that 
the received can decode the stream and recover the original 
data packets. A network coded packet is thus augmented 
with a header containing the vector of coding coefficients. 
Fortunately, the header does not grow with the number of hop 
transmissions due to Eq. (1). The encoding procedure in a peer 
node is depicted in Fig. 2. 

The decoding operations at the client basically consist in 
solving the system of equations that correspond to the network 
coding operations. Upon collecting a network coded packet, 
the client stores it in a buffer and adds a line into a matrix F 
that contains the coding coefficients. When a full rank system 
is collected, the original packets are reconstructed by solving 
the following equations 
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where c and n are respectively vectors with the coded and 
source packets. The solution of the equations system is typi- 
cally computed by gaussian elimination [4]. 

The proposed streaming system leads to the following 
observations. First, the network coding nodes act somehow 
similarly to sources in the sense that they refresh the set of 
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Fig. 3: Packet replications procedure followed in a SF node 
after the reception of the h*"^ innovative packet. 



packets in the network by coding operations. This is necessary 
as there is a non-zero probability for the reception of dupUcate 
packets in the network nodes when the network is mostly 
composed of SF nodes. These duplicates can be generated by 
a node that does not receive enough diverse packets, or from 
different nodes that independently transmit identical packets. 
These duplicate packets lower significantly the stream delivery 
performance especially in networks containing bottlenecks. 
However, the careful placement of a few network coding nodes 
in the overlay can help to reduce the number of duplicates in 
the network. If the number of network coding nodes however 
becomes too large, the probability for the randomized network 
coding operations to generate duplicate packets becomes again 
non-negligible. This is especially the case if coding operations 
are restricted to small generations due to delay constraints. As 
redundancy, delay and computational overhead might increase 
with the number of coding nodes, it becomes quite apparent 
that efficient systems should not implement network coding 
in every overlay node. Instead, one has to find an effective 
placement of network coding nodes in order to fully exploit 
the network diversity in overlay streaming applications. 

III. SF BUFFER MODEL 

We provide in this section a buffer model for the SF nodes, 
which forward and possibly replicate packets if the outgoing 
bandwidth is sufficient. The buffer model is used to estimate 
the rate of replicated packets, which is an important parameter 
in the computation of the decoding delay in the receivers. 

As illustrated in Fig. 3, we consider that each SF node has 
two buffers of capacity h (in packets): the Main Buffer (MB), 
where the incoming packets are stored, and the Copies Buffer 
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(CB) where copies of the packets that have been recently 
transmitted are stored. Both buffers follow a FIFO model as the 
oldest packets are overwritten by the new ones when the node's 
buffer capacity is exceeded. In addition, since our system 
works with deadhne-constrained data, the packets are removed 
from the buffers when their decoding deadline expires. 

The buffering process works as follows: when a packet 
arrives in a SF node, it is stored in MB. When the SF node 
has a transmission opportunity, a packet from MB is sent and 
thus removed from MB. A copy of this packet is kept in CB. 
Whenever MB is empty and the node has other transmission 
opportunities, it randomly selects a packet from CB and trans- 
mits it. In other words, the node transmits packet rephcates 
when the outgoing bandwidth is sufficient. The packets in 
CB are overwritten after some time by newer packets. In 
our model, if a SF node does not have sufficient outgoing 
bandwidth for repUcation, it does not use CB. Alternatively, if 
the outgoing bandwidth is large, CB is used extensively and 
MB is often empty. 

We are now interested in computing the number of packet 
rephcates generated by the SF node u under the proposed 
buffer model. A priori, the average number of replicates per 
packet R{u) at node u is given by the ratio of outgoing and 
incoming bandwidths bo{u) and bi{u) respectively. We can 
write R{u) = . However, the probability for a packet to 
be replicated depends on the order of its arrival to the SF node. 
Typically, a packet that arrives early spends more time in the 
buffer and thus has a higher probability to be replicated than 
a packet that arrives late and close to the decoding deadhne. 
We thus consider three cases depending on the order of arrival 
of the packets. We denote the arrival by the position k of 
a packet in a coding generation. The first case includes the 
earliest packets, which reach the node while CB is not full. 
Every packet is replicated with a uniform probability in CB, 
but the packets that stay longer in the buffer have a higher 
chance to be replicated. The number of copies for the fc*'* 
packet is thus given by: 

Rk{u) = l + {R{u)-l) (^^l + {k-l)^ , forfce [l,/i] 

(2) 

The second case corresponds to packets that reach the node 
while CB is full. When CB is full, each new packet overwrites 
the oldest packet in CB. Each packet has a hfetime in CB that 
corresponds to the time necessary to collect h new packets 
in the SF node. The replication probability is equal to 
and the number of copies is then equivalent to the average 
rephcation rate in the node. In this stationary mode, we have 

Rk{u) = R{u), fov k e[h + l,Kl (3) 

where K is the number of packets that fully traverse CB 
until the head of the buffer's queue. Finally, the third case 
corresponds to the packets that do not spend a full lifecycle 
in CB due to the expiration of the decoding deadhne. When 
the decoding deadhne expires, CB is flushed, and the packets 
in the buffer at that moment have less opportunities to be 



replicated, since they do not traverse fully the buffer. If 
k' = k — K denotes the position of the packets in CB when 
the buffer is flushed, the number of copies of the late packets 
can then be written as 

Rk{u) = l + {R{u)-l)- ^~^'^^ , iovk&[K+l,\M{u)\l 

(4) 

where |A/^(u)| is the number of packets of the same coding 
generation that reach the node u. 

In siunmary, two main factors affect the packet rephcation 
rate, the FIFO behavior of the CB buffer and the expiration of 
the decoding deadline that causes the deletion of packets in the 
buffer. Thus, the first h — 1 packets are replicated more than 
average and the last packets are replicated less than average, 
while the intermediate packets have constant replication rate. 
Finally, it should be noted that, depending on the bandwidth 
value and the delay constraints, there are situations where the 
buffer does not reach the stationary regime of Eq. (3) and 
the computation of the number of rephcates shall be adapted 
accordingly. 

We use the above buffering model to compute an equivalent 
packet replication rate R{u) for all packets in a SF node, which 
is more precise than the average value R{u) = t^^j^- Th^ 
equivalent replication rate is estimated so that the number of 
packets at the client c is preserved with respect to the case 
where the packet replication rate is computed independently 
for each packet. We assume that each packet travels indepen- 
dently to the client c, and we pose the following equivalent 
condition: 

W(u)\ 

^ {l - = W{u)\ ■ {l - , 

fe=i 

(5) 

where edu) is the probabihty of loosing a packet between 
the node h and the client c. The number of copies Rk{u) 
is computed from Eqs. (2), (3) and (4). Rewriting the above 
equation, we can express the equivalent replication rate as 

|AA(")I ] 
log(ec(w)) 

We use this replication rate estimate in the computation of 
the decoding delay in the next section. 

IV. Delay analysis 

A. Estimation methodology 

Our objective is to minimize the decoding delay by the 
proper placement of NC nodes in the overlay. The decoding 
delay is the time required to gather a sufficient number of 
innovative packets for decoding. In the analysis below we 
restrict our attention to cases where a full rank system is 
built at decoder and estimate the delay necessary for this 
situation to happen. We further construct our analysis for one 
coding generation, while the extension to multiple generations 
is straightforward. The decoding delay depends on the rate of 
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Fig. 4: Flow decomposition of the network graph by the proposed algorithm. The top (red) node and the bottom (green) node 
are respectively the source and the client of the considered topology. The SF and NC nodes are represented respectively by 
the big and small black nodes. Flow from the (a) source, (b) first NC node, (c) second NC node, (d) third NC node, and (e) 
fourth NC node. 



innovative packets at the client. The innovative rate increases 
monotonically with the number of useful packets at the client 
[4], which corresponds to the number of different packets that 
reach the client. Hence, a higher rate of useful packets leads 
to a smaller decoding delay. 

In order to compute the decoding delay, we consider that SF 
nodes replicate packets in case of large outgoing bandwidth. 
We further consider that new packets are generated only at 
sources and NC nodes. We treat these nodes independently in 
the computation of the delay at the client as illustrated in Fig. 
4. We assume that the probability of generating two identical 
packets in the sources or NC nodes is negligible due to the 
large size of the Galois field. In more details, we first estimate 
the delay noticed by the client when packets are sent from a 
given source through all the paths connecting this source to 
the client, except for the paths that traverse NC nodes (see 
Fig. 4a). Next, we consider the NC node that is the closest 
to the source and all the paths that connect it to the client, 
except for those passing through other NC nodes (see Fig. 
4b). Similarly, all other NC nodes are considered only when 
all their parent NC nodes have been visited. This procedure is 
repeated for the unprocessed NC nodes and the corresponding 
graphs are shown respectively in Figs. 4c, 4d and 4e. The total 
delay is computed under the assumption that all sources and 
NC nodes send independent streams. Equivalently, we assume 
that the total useful flow is equal to the sum of the useful 
flows generated by the source and all the NC nodes. 

As we consider lossy network paths, we have to consider 
that some of the packets generated by network nodes do not 
reach their destination. We thus estimate the probability e^u) 
that a packet sent by the node u does not reach the client c. 
This probability is computed on the subgraph that contains all 
the paths connecting the node u with the client c but excludes 
the paths that traverse other NC nodes since these typically 
alter the set of received packets. The corresponding subgraphs 
are colored in red in Fig. 4. We denote by D{u) the set of 
children of node u (excluding the NC nodes) in the subgraph 



and we define 

Pu.v — ^ ^ 

veD{u) 

as the probability that a packet transmitted by node u is 
forwarded to a descendant node v E D{u). In addition, we 
write the probability that a packet is deleted in a node due to 
buffer overflow as 

\0, K{u)>h{u) 

Recall that ho{u) and hi{u) are respectively the cumulative 
incoming and outgoing bandwidth of node u. Then, a packet 
sent by node u might not reach the client c due to one of 
the following three causes. The packet can be lost during its 
transmission to the child node v or it can be lost at the node v 
due to buffer overflow. Finally, it can be lost along with all its 
possible copies during the transmission from the child node v 
to the client c. Overall, the probability ec{u) is given as 

ec{u) = ^ pu,v ■ {t^u,v + (1 - ■^u,v) ■ P{u)} 
v£D(u) 

. (7) 

+ Pu,v {1 ~ TT^.v) ■ {1 - I3{u)) ■ e.ivf^-l 

veD(u) 

The probability ec{u) can be computed recursively back- 
wards starting from the clients up to the server or NC nodes. 
Specifically, we first set to zero the loss probabilities for all 
the clients. Then, all nodes in the directed acyclic subgraph 
are visited backwards. Each node is visited only when all its 
children nodes have been processed already. Once edu) is 
known, we can compute the rate of useful packets received by 
the client c from a source or a NC node by multiplying the 
respective outgoing rate by the probability of correct reception 
1 - edu). 

Next, we define Nc{u), the number of packets received by 
any node u that are potentiaUy useful for the client c. These 
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packets correspond to the data that reaches the node u and 
contains information that is potentially useful for the chent 
c. It depends on the paths connecting the source nodes and 
the NC nodes to the node u, i.e., the subgraphs colored in 
green in Fig. 4. It also depends on the set of SF nodes on 
these paths. The rate of useful packets transmitted by the 
sources corresponds to their outgoing bandwidth, as they are 
able to generate any number of different packets via network 
coding. However, the useful rate in NC nodes may be smaller 
than their outgoing bandwidth, as they may have only part of 
the source information in their buffer. Finally the number of 
useful packets in SF nodes cannot be larger than the number 
of incoming packets. It is however difficult to estimate in a 
direct way, so that the estimation of the delay based on the 
useful rate sent by network nodes becomes hard. We therefore 
propose below to compute the delay in a recursive manner. 

B. Decoding delay 

In this section, we estimate the delay tc at a chent node 
c. The decoding delay depends on the rate of useful packets 
received from the multiple sources or network coding nodes. 
We estimate the time necessary to form a system of full rank 
G at the decoder, where G is the generation size in packets. 
In practice, the client might need to collect a slightly larger 
number of packets, G = G/{1 — x) [4] for forming a system 
with G innovative packets. This is due to the possibiUty that 
useful packets from a source might still be redundant and not 
completely independent of packets generated by other sources. 
The exact value of the overhead factor x depends on the coding 
system (e.g., it can be upper-bounded by 1 /q for RNC, where q 
is the GF size). However, our analysis is relative, and compare 
different configurations to select the option that leads to the 
minimal decoding delay. It becomes therefore equivalent to 
work with G or G since the solutions that lead to the fastest 
deUvery of G and respectively G packets are identical. We 
choose to work with G in the rest of this section. 

We compute the average decoding delay by first estimating 
the time necessary to collect enough packets from each source 
or NC node independently. Under the assumption that each one 
of these multiple collection processes represents a uniform 
flow of packets, we can finally approximate the expected 
decoding delay as the time necessary for the collection of a 
sufficient number of packets from multiple independent flows. 

The complete algorithm for computing the decoding delays 
is given in Algorithm 1. Note that the algorithm uses an 
iterative procedure to compute the decoding delays, since the 
equivalent packet replication rate in SF node (see Section III) 
cannot be exactly computed at first. The algorithm initializes 
the rephcation rate to an average value given by the input and 
output bandwidths of each node, and refines this value along 
with the successive decoding delay estimations. The NC nodes 
are examined in the order of their proximity to the sources, i.e., 
the nodes that are closer to the sources are processed first. The 
number of useful packets Nc{u) is computed recursively at all 
NC nodes, starting from those that are close to the sources. 
Then the algorithm considers NC nodes that receive packets 
from NC nodes that have been already visited. This specific 



procedure is applicable in our framework as we consider the 
iterative selection of network coding nodes. Nodes are checked 
in a greedy way and the algorithm improves at each stage 
the current solution by the selection of the additional network 
coding node that brings the largest delay reduction. The overall 
algorithm typically converges only after a few iterations. 

We describe now the delay estimation algorithm in more 
details. The steps 7-11 of Algorithm 1 correspond to the 
estimation of the useful rate Nc{u) in NC nodes, which is 
necessary for computing the initial replication rate of the node 
u. As this rate is difficult to estimate in a direct way, we choose 
a differential method by comparing the delay tc{u) observed 
by the client when the node u is active and respectively 
silent. From the delay difference Atc{u) we compute the rate 
difference ANc{u) that is the useful rate at node u. Finally, 
the rate Nc{u) of the packets at node u that are useful for the 
chent c is computed by solving 

AN (u) = /^^^"^ ■ " ' > ^^(^^ 

'^"^ \iV,(«).^.(l-ee(u)), K{u)<h{u) 

(8) 

where the first and second conditions correspond to the cases 
when the node u in SF mode has a smaU incoming, respec- 
tively outgoing bandwidth. Note that, when the network does 
not contain any NC nodes, the rate of useful packets that are 
transmitted is simply equal to the output bandwidth of the 
sources. In this case, the useful rate received by the chent is 
simply Nc{s) = bo{s) ■ (1 - ec(s)). 

Then we compute the delay due to packets sent by the 
different sources and NC nodes in the network. We consider 
two cases. First, we consider the NC nodes that have hmited 
incoming bandwidth (i.e., Nc{u) < bo{u) in line 12 or Alg. 1) 
and the sources with outgoing bandwidth larger than the source 
rate. The probability of generating useful packets in such nodes 
evolves as the buffer fills in. We start by estimating the number 
of different packets received by the client c when the node 
u is the only source of information. In average, the node u 
sends i'{u) = packets in the inter-arrival time of two 

consecutive packets in its buffer, which are combinations of the 
same set of input packets. Out of these ^{u) packets, k packets 
can be considered as useful for the client c if the decoding 
system has a rank deficiency of A: < iy{u). Furthermore, due 
to packet losses and bandwidth variations in the network, each 
of the packets generated by the node u arrives at the client c 
with probability ec(w)- The probability Ak{u) that k out of 
the i'{u) packets arrive at the client c is 

Ak{u) = (^^^1^^^ (1 - e,{u)f ee(n)^(")-^ (9) 

Note that in general, has not an integer value. We there- 
fore perform an interpolation between the values of Ak{u) 
evaluated on the integer values nearest to ^{u). 

We then consider the probability Pc{u,r,n) for the client 
c to collect r useful packets from data sent by a node u that 
possesses n useful packets. This probability can be computed 
recursively as 
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Algorithm 1 Delay computation algorithm 



Pc{u, r, n) = Pc{u, r — k,n — 1) ■ Ak{u),yr G [l..n — 1] 
fe=o 

(10) 

Eq. (10) holds for r <n. When r = n, it becomes 

Pc{u,n,n) = ^Pc(w,n- A;,n- 1) • (11) 
fc=i i=fe 
We further denote by Vc{u,r,n) the probabiUty that the 
client c collects k useful packets precisely due to the arrival 
of the n*'' useful packet in the sending node u. It can be 
written as 



Vc{u,r,n) = 



E 

fe=l 



Pc{u,r-k,n-l)-^Ai{u) 



l=k 



(12) 



where Pc(w, r — fc, n— 1) includes all possible events that lead 
the node u to collect packets with rank r — k when it receives 
the n — 1 useful packet for client c. 

The arrival time of the n*'' useful packet at the sending node 
u can be computed from the useful packet rate A^c(u).We 
assume that Nf.{u) represents a constant rate and that the 
arrival times of packets in node u are uniformly distributed. 
Now, one can compute the expected number of useful packets 
Ec{u) that are necessary at the sending node u for the cUent 
c to receive G useful packets. It is expressed simply as 



Ee{u) 



oo 

E 



p-Vc{u,G,p) 



(13) 



The decoding delay for the client c when the NC node u is 
the only source of information can be estimated by dividing 
the expected number of necessary packets by the inter-arrival 
time between two useful packets. It is written as 
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10 
11 

12: 
13 

14: 
15: 

16 

17: 
18 



19: 
20: 
21: 

22: 
23: 
24: 



K{n) 
bi{n) • 



Initialize replication rates for every node: R{u) 
repeat 

for each client node c do 

Compute ec{u) for sending nodes u from Eq. (7). 
Compute Nc{s) = bo{s) ■ {1 — ec(s)) for all sources 
s. 

for each NC node u do 

Compute tc{u) using Eqs. (10)-(14) setting node 
u in SF mode. 

Compute tc{u) using Eqs. (10)-(14). setting node 
u in silent mode. 

Compute Atc{u) = tc{u) — tc{u) . 
Compute ANc{u) = l/Atdu). 
Compute Nc{u) using Eq. (8). 
if Nc{u) < bo{u) then 

Compute tc{u) using Eqs. (10)-(14) setting node 

u in NC mode, 
else 

Compute the expected decoding delay tc{u) 
using Eq. (15). 
end if 
end for 

Compute the average decoding delay tc considering 
all sources and NC nodes simultaneously, with Eq. 
(16). 
end for 

for each SF node u do 

Estimate the total number of packets received by each 
node, per generation: |A/'('u)| = bi{u) ■ maxt^. 

c 

Update the replication rate R{u) with Eq. (6). 
end for 
untU Until convergence of tc- 



Then, we consider the sources and the NC nodes that are 
over-provisioned in bandwidth (i.e., the set of nodes u where 
Nc{u) > bo{u), line 14 in Alg. 1). We assume that they 
transmit packets that are all potentially useful for the client c. 
The number of useful packets from node u that reach the client 
c in this case is given by the rate Nc{u) = bo{u) ■ {1 — eduj). 
When this rate is uniform, the decoding delay when the node 
u is the only source of information is given by: 



tc{u) = 



G 



bo{u) ■ (1 - ec(u)) 

Finally, the average decoding delay at client c is computed 
by considering all the sources and NC nodes as independent 
sources of information with uniform useful rates l/tc{u). We 
can write the decoding delay as 



1 



E 



tc{u) 



where S is the set of sources and NC nodes. 



(14) V. Selective placement of NC nodes 

Equipped with methods to estimate the decoding delay on 
the overlay network, we can design algorithms to decide which 
nodes in the network should perform network coding. We 
address the problem of placing A network coding nodes in the 
overlay network, such that the average delay observed by the 
chents is minimized. This is typically achieved by selecting 
network coding nodes such that the packet replication rate 
is decreased and the innovative flow rate in the network is 
increased. 

(15) However, the optimal selection of the NC nodes is known 
to be an NP-hard problem [10]. Hence, we design a greedy 
approach that iteratively searches for the optimal placement 
of a new network coding node while all the previously added 
NC nodes are fixed. The candidate nodes for implementing 
network coding are all the remaining SF nodes in the overlay 
network. Our node selection algorithm examines all the SF 

(16) nodes backwards from the clients to the servers. It selects the 
SF node whose transformation into an NC node brings the 
highest benefit for the clients. This procedure is repeated until 
aU the A NC nodes have been selected. 
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We propose now two variants of the iterative selection algo- 
rithm that both use Algorithm 1 for computing the innovative 
flow rate, but differ in their view of the network resources. The 
first algorithm assumes that a central node possesses a global 
knowledge of the network; it iteratively selects the network 
coding nodes in a centralized manner. When global network 
knowledge is not a reasonable assumption, the centralized 
algorithm still serves as a performance benchmark for other 
greedy NC node placement algorithms. The second algorithm 
uses only a local view of the network resources at each node 
for computing the gains in iimovative rate and decoding delay. 
This algorithm is probably more realistic in practice and can 
be implemented in a distributed way. 

In more details, the centralized algorithm uses the knowl- 
edge about the full network and available resources in order to 
determine the number of innovative packets received by each 
cUent. It leads to the iterative selection of A NC nodes by 
computing at each stage the benefit of turning any of the SF 
nodes into an NC node. The candidate node that brings the 
highest innovative flow rate with its transformation is selected 
as a new NC node. The algorithm is described in Algorithm 
2. 



Algorithm 2 Centralized NC node selection 
1: for i = 1 to A do 

2: for each node u in the set of SF nodes, do 

3: Turn temporarily u into a NC node 

4: Estimate the average decoding delay at the clients tc 

(using Algorithm 1). 
5: Turn u back into a SF node. 
6: end for 

7: Select the node u* that minimizes the decoding delay, 

i.e., u* = argmin^ J2c *c 
8: Turn permanentiy u* into a NC node. 
9: end for 



The second algorithm relaxes the assumption that a central 
node is aware of the full network status. Instead, the nodes 
only use local network information for the estimation of the 
innovative flow rate. We define a neighborhood around each 
node. Then, an algorithm similar to the centralized solution 
above is applied in each neighborhood in order to determine 
the benefits of turning SF nodes into NC nodes. In particular, 
each node uses the estimation of the reception probability 
ec{u) that is given by all nodes u in the neighborhood and 
computes an estimation of the decoding delay based on local 
information, i.e., the capacities and the loss rates of the sub- 
network around the node u. Note that £c{u) is also calculated 
considering only the statistics of node's u neighborhood. These 
estimations are transmitted periodically to a central agent, 
which finally makes the decision on the placement of new 
NC nodes. The procedure is summarized in Algorithm 3. 

Both algorithms permit to select a few network coding 
nodes in the system, such that the coding delay and overall 
computational complexity in the network is limited. At the 
same time, the system maintains a high iimovative rate for 
sustained streaming performance. The choice of the number 



Algorithm 3 NC node selection with local information 

1: for i = 1 to A do 

2: for every SF node u do 

3: Temporarily transform node u into an NC node. 
4: Estimate the average delay ic{u) at the cUents using 

Algorithm 1 with local information. 
5: Transform node u back into an SF node. 
6: Transmit tc{u) to a central agent. 
7: end for 

8: Select the node u* that maximizes the innovative rate, 

u* = argmax„ J2c ^o{u) 
9: Turn permanently u* into a NC node. 
10: end for 



of NC nodes is typically determined by the admissible delay or 
tolerable complexity in the network. For example, constraints 
on decoding delay impose a limit on the maximum number of 
NC nodes in the system. However, the problem of determining 
the optimal number of NC nodes is out of the scope of this 
paper We rather assume that the number of coding nodes or 
helpers in the streaming system is given a priori. The proposed 
algorithms then solve the problem of placing efficiently the NC 
nodes in the overlay network. 

Finally, it has to be noted that the second algorithm is not 
fully distributed, as it still uses a central agent to select the 
NC nodes. However, since it uses only local information, the 
proposed solution is certainly amenable to a fully distributed 
algorithm. One could imagine that each node decides inde- 
pendently if it should implement network coding or not, by 
comparing the local estimation of the gain in innovative rate to 
a pre-defined threshold. Alternatively, a distributed consensus 
solution could be deployed for a coordinated selection of the 
NC nodes with minimal information exchange between the 
overlay nodes. 

VI. Simulation results 

A. Setup 

In this section, we analyze the performance of the proposed 
NC node selection algorithms for the transmission of video 
streams in overlay networks. We generate overlay networks 
based on realistic network bandwidth values and adjacency 
measurements from the Planet Lab [1 1], as provided in a snap- 
shot of their network taken on 24 Nov. 2009 by their Scalable 
Sensing Service (S^) [12]. The networks under consideration 
have one source node, three client nodes and a variable number 
of intermediate nodes. We create network topologies in the 
following way. First, the source nodes are positioned, then the 
nodes are randomly added one-by-one to the topology. For 
every new node, four nodes are randomly chosen as parent 
nodes. However, if the new node is not directly connected 
to any of the selected parents according to the Planet Lab 
measurement data, the node is removed and a new node is 
selected. After all nodes have been added to the network, the 
nodes that cannot be reached by the source and the nodes that 
are not coimected with any client are removed. The resulting 
network graphs are directed acyclic by construction. The edge 
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capacity of each link is set to 1 /200 of the Planet Lab capacity 
values, in order to get realistic values for the link bandwidth. 
Finally, the packet loss rate of each Unk is set to 5%. 

The network coding operations are performed in a Galois 
field of size GF(2^) since this field size has been shown to 
result in a good compromise between performance (packets 
are innovative with high probabiUty) and information over- 
head [4]. The generation size is set to 32 packets, which is 
reasonable for real time video streaming applications. The 
packet size is 512 bytes. The decoding is performed by 
gaussian elimination. Since we are interested in analyzing 
the performance in terms of decoding delay, we compute the 
average delay as the time needed by each client to receive 32 
linearly independent packets {i.e., a complete generation). This 
is the minimal number of packets for the clients to decode the 
source information. We use a set of 10 random networks that 
consist of 32 to 56 nodes with two different radii of 6 and 8 
hops (i.e., the maximum distance between any pair of nodes in 
the network is 6 or 8 hops). We consider the placement of up 
to 10 NC nodes in each network. In each case, the performance 
results are averaged over 100 simulations performed using the 
NS-3 [13] network simulator. 

We evaluate the performance of the centralized and semi- 
distributed algorithms (resp. the Algorithm 2 and Algorithm 
3) denoted as "ALG02" and "ALG03" respectively). We 
compare them with a greedy search algorithm (ALG02R) 
similar to Algorithm 2 but that uses the actual delays obtained 
by NS-3 simulations instead of the delay estimates from 
Algorithm 1 for the node selection. We also compare to a 
baseline scheme called in the following as "RANDSEL" that 
randomly places the NC nodes in the network. For the sake of 
completeness, we finally study the performance of a scheme 
where all nodes are NC nodes (this scheme is called "All nodes 
NC"), a Linear NC scheme [4] and a scheme with only SF 
nodes. The performance of the later is equal to the theoretical 
maximum that can be sustained when only routing is enabled 
and can be found by typical maximum flow algorithms). 

Note that the minimum delay and maximum effective 
throughput obtained with Linear NC are computed by consid- 
ering that a high capacity hyper-source node connected with 
all sources. In this case, the overall throughput is computed 
as the sum of the throughputs from the hyper-source to each 
client node. For the routing case, we consider as well a hyper- 
sink node hnked with all chent nodes, and we compute the 
maximum throughput between the hyper-source and hyper- 
sink with standard graph maximum flow algorithms. We 
should point out that the links connecting the hyper nodes 
with the networks sources and cUents are error free and have 
infinite capacities, so they do not introduce extra delays. 

B. Decoding delay 

We first study the decoding delay for each of the algorithms. 
Figs. 5 (a) and (b) illustrate the normalized average decoding 
delays for the network clients as a function of the number 
of NC nodes added in the network, for two different network 
sizes. The decoding delays are normaUzed to the performance 
obtained when all nodes perform network coding. We show 



the performance of Algorithm 3 for different sizes of the 
neighborhood in the local gain estimations. We observe a sharp 
reduction of the delivery times with the addition of the first 
few NC nodes for all the algorithms, but especiaUy for the 
proposed algorithms. The gains become less important after 
a few NC nodes have been placed in the network. We can 
thus see that the NC nodes are well positioned in order to 
improve the delivery performance. The results also highlight 
the inefficiency of the RANDSEL algorithm, which becomes 
competitive with the other methods only for large number of 
NC nodes. Finally, we can see that the Algorithm 2 performs 
similar to the ALG02R. This confirms that the proposed delay 
estimation strategy in ALG02 is accurate as it comes close to 
the actual delay values measured by the network simulator. 

C. Innovative rate 

We further look at the average normalized effective through- 
put in the network {i.e., the number of useful packets received 
by the clients), as a function of the number of NC nodes 
in the network. Normalization is performed with respect to 
the effective throughput achieved by the scheme where all 
nodes perform network coding. Figs. 6 (a) and (b) show 
the effective throughputs for two different network radii. The 
results confirm the earlier observations on the decoding delay 
performance. A few well selected nodes are able to bring 
a large throughput gain. Further performance improvements 
become less important as the number of NC nodes increase. 
We see also that the algorithms proposed in this paper provide 
the best performance among the schemes under comparison. 
The node selection algorithm with local information improves 
with the size of the neighborhood but generally stays close to 
the centralized algorithm when the neighborhood is sufficiently 
large. In the case where the neighborhood is limited to one 
node, the proposed algorithm still outperforms RANDSEL 
since the decisions are not totally blind. The reason for the 
inferior performance of the semi-distributed scheme compared 
to the centralized one simply comes from the fact that the local 
network statistics are not sufficient for accurately estimating 
the delay when the neighborhood is small. In addition, we can 
observe that a few NC nodes are sufficient for obtaining higher 
throughputs than the one in routing algorithms where the nodes 
simply forward packets randomly to their descendants. It con- 
firms the fact that our methods are appropriate for deployment 
in low-cost networks, where a few helpers or network coding 
nodes are sufficient for improved throughput and efficient data 
deUvery. Finally, we can observe that our algorithms tend to 
perform better in larger networks (i.e., larger radius values), as 
they are able to exploit more efficiently the available resources 
and the diversity in the overlay network. 

D. Video quality 

Finally, we study the performance of the video delivery 
schemes from the viewpoint of video quality. We estimate the 
average PSNR quality measured at the clients with respect to 
the number of NC nodes in the network for all methods under 
comparison in the transmission of the Foreman CIF sequence 
encoded by the the JM12.2 [14] of the H.264/AVC standard 
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Fig. 5: Normalized average decoding delays versus the number of NC nodes, in networks with maximum distance between 
any pair of nodes equal to: (a) six and (b) eight. 
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Fig. 6: Normalized achievable throughput versus the number of NC nodes, in networks with maximum distance between any 
pair of nodes equal to: (a) six and (b) eight. 



[15]. The quality is estimated by setting the encoding rate to 
the value of the network throughput in the different schemes. 
The corresponding results are illustrated in Figs. 7 (a) and (b) 
for networks with a radius of six and eight hops, respectively. It 
is interesting that the improved throughput values translate into 
higher PSNR quality which confirms the above observations 
about the benefits of proper NC node selection. We can have 
gains that exceed 1.5 dB with only two NC nodes, whereas we 
reach gains of 3 dB for seven NC nodes. The PSNR gains satu- 
rate as the number of NC nodes increases, but quality gains can 
still be noticed. As expected, larger gains are observed for the 
centralized algorithm, however, the semi-distributed algorithm 
offers significant gains as well. For a neighborhood of three 
hops, the performance of the semi-distributed even becomes 
identical to that of the centralized scheme. Finally, we see 
that RANDSEL gives small PSNR gains for a few NC nodes, 
which confirms the poor performance of a random selection 
of the network coding nodes and supports the development of 
effective selection algorithms. 



VII. Related work 

The problem of finding a minimal set of network coding 
points in a network has been mostly studied from a theoretical 
perspective so far First, the special case of two source 
messages is examined in [16] where it is proved that the 
number of coding nodes is independent of the total number 
of network nodes. In [17], the minimum number of network 
coding nodes is computed through graph coloring techniques. 
It is then shown in [18] that the number of coding nodes is 
upper bounded by the number of receivers. A unification of 
network coding and tree packing theorems is further presented 
in [19], where network coding is restricted to pre-selected 
edges. These include only input edges of relay nodes and not 
the input edges of clients where simple routing is applied. This 
choice is made in order to achieve the min-cut max-flow limit 
of the network and save both processing and implementation 
complexity. The relation between links capacities and the 
number of coding nodes is investigated in [20], where it is 
shown that in directed acyclic networks arbitrary amounts of 
gain can be noticed when subsets of nodes of arbitrary size are 
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Fig. 7: Average PSNR quality at clients versus the number of 
pair of nodes equal to: (a) six and (b) eight. 

used for coding. Finally, the problem of finding network codes 
with a minimal number of encoding nodes has recently been 
studied in [10] where the optimization problem is however 
shown to be NP-hard. 

While the previous work mostly consider that the network 
is fully known at a central node, a decentralized algorithm for 
minimizing the number of network coding packets flowing in 
a network has been presented in [21]. It also addresses the 
design capacity approaching network codes that minimize the 
set of network coding nodes. However, this algorithm does not 
provide any guarantee that the minimum set of network coding 
nodes can always be determined. While [21] consider capacity 
approaching codes without delay constraints, we rather use 
well performing network codes and consider the available 
resources in the network in order to select a set of network 
coding nodes, such that the overall delay is kept small in 
multimedia applications. The choice of randomized network 
codes is mostly geared towards the implementation of practical 
distributed systems where large benefits are expected by the 
proper choice of a limited number of network coding nodes. 

In general, the previous works about the selection of coding 
nodes do not consider delay issues, which are most important 
in streaming applications. The problem of the selection of net- 
work processing nodes in multimedia streaming applications 
has been addressed in [22] in a framework that is however 
slightly different than ours. The placement of a limited number 
of network-embedded FEC nodes (NEF) is considered in 
networks that are organized into multicast trees. The placement 
is chosen in order to enhance the robustness to transmission 
errors and to improve the network's throughput. NEF nodes 
first decode and successively re-encode the recovered packets 
in order to increase the symbol diversity. A greedy algorithm 
is proposed for placing NEF nodes. Although the proposed 
method is efficient, it is computationally expensive and un- 
realistic to be deployed in dynamic networks. In contrast to 
[22], we consider the placement of processing nodes in the 
more general case of overlay mesh networks with randomized 
network coding for distributed packet delivery. 

Finally, game theoretic concepts are adopted in a recent 
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work [23] for developing socially optimal distributed algo- 
rithms that decide on the nodes that should combine packets. 
Specifically, incentives such as extra download bandwidth are 
given to network nodes in order to change their status to 
network coding and indirectly minimize the delays in the 
system. However, this algorithm does not offer any guarantee 
that limited resources will be used efficiently, since all the 
nodes may potentially desire to become network coding nodes. 
It is not appropriate when a certain number of network coding 
nodes shall be placed effectively in a network. 



VIII. Conclusions 

We have considered the problem of the placement of a 
predefined number of network coding nodes in a overlay media 
streaming system. We have proposed novel algorithms that 
iteratively select the best nodes for network coding such that 
the delay is decreased. The deployment of network coding 
gets positioned as a valid solution for exploiting the network 
diversity in streaming applications. We show that the selection 
of a small number of network coding nodes is able to provide 
an effective tradeoff between packet duplicates, decoding delay 
and computational complexity. The experimental evaluation in 
irregular and realistic networks shows that the proposed node 
selection schemes achieves the same throughput as a system 
where all nodes perform network coding, but with a dramat- 
ically smaller number of network coding nodes and hence 
lower complexity. In addition, we show that the quality of 
experience in video streaming applications is greatly improved 
by the proper selection of network coding nodes. Finally, 
the proposed algorithm is amenable to the implementation 
of distributed solutions that are able to adapt to the local 
characteristics of a dynamic network topology. It could also 
offer important insights in the design of effective media 
delivery solutions, where helpers nodes could be positioned 
in an overlay networks for maximizing the quality of service 
offered to the media clients. 
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